joint state
Further Details
A.1 Dataset Details The 20 micro-variations of the 5 macro-variations of the scene were created with the rule of swapping at least two furniture pieces and perturbing the positions of a subset of the other furniture pieces. The occurrences of various furniture objects in these 100 micro-variations are illustrated in Figure 1. Several furniture objects such as'Beanbag' and'Chair' occur more frequently with multiple instances in a some scenes while others such as'Table 03' occur less frequently. We also analyze the object categories of all objects in the original 6 'FRL-apartment' space recreations. We map each of the 92 objects to a semantic category and list the counts per semantic category in a histogram in Figure 1. Since these spaces have a large kitchen area, there is a larger ratio of kitchen objects such as'Kitchen utensil' and'Bowl'. Top down views of the 5 'macro variations' of the scenes are shown in Figure 1. These variations are 5 semantically plausible configurations of furniture in the space generated by a 3D artist. Each surface is annotated with a bounding box, enabling procedural placement of objects on the surfaces. For each of these 5 variations, we generate 20 additional variations, giving 105 scene layouts. Objects are procedurally added on furniture and surfaces using the annotated supporting surface and containment volume information provided by ReplicaCAD.
A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator
Wang, Runcong, Wang, Fengyi, Cheng, Gordon
This paper presents a hetero-associative sequential memory system for mobile manipulators that learns compact, neuromorphic bindings between robot joint states and tactile observations to produce step-wise action decisions with low compute and memory cost. The method encodes joint angles via population place coding and converts skin-measured forces into spike-rate features using an Izhikevich neuron model; both signals are transformed into bipolar binary vectors and bound element-wise to create associations stored in a large-capacity sequential memory. To improve separability in binary space and inject geometry from touch, we introduce 3D rotary positional embeddings that rotate subspaces as a function of sensed force direction, enabling fuzzy retrieval through a softmax weighted recall over temporally shifted action patterns. On a Toyota Human Support Robot covered by robot skin, the hetero-associative sequential memory system realizes a pseudocompliance controller that moves the link under touch in the direction and with speed correlating to the amplitude of applied force, and it retrieves multi-joint grasp sequences by continuing tactile input. The system sets up quickly, trains from synchronized streams of states and observations, and exhibits a degree of generalization while remaining economical. Results demonstrate single-joint and full-arm behaviors executed via associative recall, and suggest extensions to imitation learning, motion planning, and multi-modal integration.
LiHRA: A LiDAR-Based HRI Dataset for Automated Risk Monitoring Methods
Plahl, Frederik, Katranis, Georgios, Mamaev, Ilshat, Morozov, Andrey
We present LiHRA, a novel dataset designed to facilitate the development of automated, learning-based, or classical risk monitoring (RM) methods for Human-Robot Interaction (HRI) scenarios. The growing prevalence of collaborative robots in industrial environments has increased the need for reliable safety systems. However, the lack of high-quality datasets that capture realistic human-robot interactions, including potentially dangerous events, slows development. LiHRA addresses this challenge by providing a comprehensive, multi-modal dataset combining 3D LiDAR point clouds, human body keypoints, and robot joint states, capturing the complete spatial and dynamic context of human-robot collaboration. This combination of modalities allows for precise tracking of human movement, robot actions, and environmental conditions, enabling accurate RM during collaborative tasks. The LiHRA dataset covers six representative HRI scenarios involving collaborative and coexistent tasks, object handovers, and surface polishing, with safe and hazardous versions of each scenario. In total, the data set includes 4,431 labeled point clouds recorded at 10 Hz, providing a rich resource for training and benchmarking classical and AI-driven RM algorithms. Finally, to demonstrate LiHRA's utility, we introduce an RM method that quantifies the risk level in each scenario over time. This method leverages contextual information, including robot states and the dynamic model of the robot. With its combination of high-resolution LiDAR data, precise human tracking, robot state data, and realistic collision events, LiHRA offers an essential foundation for future research into real-time RM and adaptive safety strategies in human-robot workspaces.
FTACT: Force Torque aware Action Chunking Transformer for Pick-and-Reorient Bottle Task
Watanabe, Ryo, Alvarez, Maxime, Ferreiro, Pablo, Savkin, Pavel, Sano, Genki
Manipulator robots are increasingly being deployed in retail environments, yet contact rich edge cases still trigger costly human teleoperation. A prominent example is upright lying beverage bottles, where purely visual cues are often insufficient to resolve subtle contact events required for precise manipulation. We present a multimodal Imitation Learning policy that augments the Action Chunking Transformer with force and torque sensing, enabling end-to-end learning over images, joint states, and forces and torques. Deployed on Ghost, single-arm platform by Telexistence Inc, our approach improves Pick-and-Reorient bottle task by detecting and exploiting contact transitions during pressing and placement. Hardware experiments demonstrate greater task success compared to baseline matching the observation space of ACT as an ablation and experiments indicate that force and torque signals are beneficial in the press and place phases where visual observability is limited, supporting the use of interaction forces as a complementary modality for contact rich skills. The results suggest a practical path to scaling retail manipulation by combining modern imitation learning architectures with lightweight force and torque sensing.
Improved Vehicle Maneuver Prediction using Game Theoretic Priors
Conventional maneuver prediction methods use some sort of classification model on temporal trajectory data to predict behavior of agents over a set time horizon. Despite of having the best precision and recall, these models cannot predict a lane change accurately unless they incorporate information about the entire scene. Level-k game theory can leverage the human-like hierarchical reasoning to come up with the most rational decisions each agent can make in a group. This can be leveraged to model interactions between different vehicles in presence of each other and hence compute the most rational decisions each agent would make. The result of game theoretic evaluation can be used as a "prior" or combined with a traditional motion-based classification model to achieve more accurate predictions. The proposed approach assumes that the states of the vehicles around the target lead vehicle are known. The module will output the most rational maneuver prediction of the target vehicle based on an online optimization solution. These predictions are instrumental in decision making systems like Adaptive Cruise Control (ACC) or Traxen's iQ-Cruise further improving the resulting fuel savings.
IK Seed Generator for Dual-Arm Human-like Physicality Robot with Mobile Base
Takamatsu, Jun, Kanehira, Atsushi, Sasabuchi, Kazuhiro, Wake, Naoki, Ikeuchi, Katsushi
Robots are strongly expected as a means of replacing human tasks. If a robot has a human-like physicality, the possibility of replacing human tasks increases. In the case of household service robots, it is desirable for them to be on a human-like size so that they do not become excessively large in order to coexist with humans in their operating environment. However, robots with size limitations tend to have difficulty solving inverse kinematics (IK) due to mechanical limitations, such as joint angle limitations. Conversely, if the difficulty coming from this limitation could be mitigated, one can expect that the use of such robots becomes more valuable. In numerical IK solver, which is commonly used for robots with higher degrees-of-freedom (DOF), the solvability of IK depends on the initial guess given to the solver. Thus, this paper proposes a method for generating a good initial guess for a numerical IK solver given the target hand configuration. For the purpose, we define the goodness of an initial guess using the scaled Jacobian matrix, which can calculate the manipulability index considering the joint limits. These two factors are related to the difficulty of solving IK. We generate the initial guess by optimizing the goodness using the genetic algorithm (GA). To enumerate much possible IK solutions, we use the reachability map that represents the reachable area of the robot hand in the arm-base coordinate system. We conduct quantitative evaluation and prove that using an initial guess that is judged to be better using the goodness value increases the probability that IK is solved. Finally, as an application of the proposed method, we show that by generating good initial guesses for IK a robot actually achieves three typical scenarios.
Generative Multi-Agent Q-Learning for Policy Optimization: Decentralized Wireless Networks
Q-learning is a widely used reinforcement learning (RL) algorithm for optimizing wireless networks, but faces challenges with large state-spaces. Recently proposed multi-environment mixed Q-learning (MEMQ) algorithm addresses these challenges by employing multiple Q-learning algorithms across multiple synthetically generated, distinct but structurally related environments, so-called digital cousins. In this paper, we propose a novel multi-agent MEMQ (M-MEMQ) for cooperative decentralized wireless networks with multiple networked transmitters (TXs) and base stations (BSs). TXs do not have access to global information (joint state and actions). The new concept of coordinated and uncoordinated states is introduced. In uncoordinated states, TXs act independently to minimize their individual costs and update local Q-functions. In coordinated states, TXs use a Bayesian approach to estimate the joint state and update the joint Q-functions. The cost of information-sharing scales linearly with the number of TXs and is independent of the joint state-action space size. Several theoretical guarantees, including deterministic and probabilistic convergence, bounds on estimation error variance, and the probability of misdetecting the joint states, are given. Numerical simulations show that M-MEMQ outperforms several decentralized and centralized training with decentralized execution (CTDE) multi-agent RL algorithms by achieving 55% lower average policy error (APE), 35% faster convergence, 50% reduced runtime complexity, and 45% less sample complexity. Furthermore, M-MEMQ achieves comparable APE with significantly lower complexity than centralized methods. Simulations validate the theoretical analyses.
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization
Q-learning is a powerful tool for network control and policy optimization in wireless networks, but it struggles with large state spaces. Recent advancements, like multi-environment mixed Q-learning (MEMQ), improves performance and reduces complexity by integrating multiple Q-learning algorithms across multiple related environments so-called digital cousins. However, MEMQ is designed for centralized single-agent networks and is not suitable for decentralized or multi-agent networks. To address this challenge, we propose a novel multi-agent MEMQ algorithm for partially decentralized wireless networks with multiple mobile transmitters (TXs) and base stations (BSs), where TXs do not have access to each other's states and actions. In uncoordinated states, TXs act independently to minimize their individual costs. In coordinated states, TXs use a Bayesian approach to estimate the joint state based on local observations and share limited information with leader TX to minimize joint cost. The cost of information sharing scales linearly with the number of TXs and is independent of the joint state-action space size. The proposed scheme is 50% faster than centralized MEMQ with only a 20% increase in average policy error (APE) and is 25% faster than several advanced decentralized Q-learning algorithms with 40% less APE. The convergence of the algorithm is also demonstrated.
Adaptive Visual Perception for Robotic Construction Process: A Multi-Robot Coordination Framework
Xu, Jia, Dixit, Manish, Wang, Xi
Construction robots operate in unstructured construction sites, where effective visual perception is crucial for ensuring safe and seamless operations. However, construction robots often handle large elements and perform tasks across expansive areas, resulting in occluded views from onboard cameras and necessitating the use of multiple environmental cameras to capture the large task space. This study proposes a multi-robot coordination framework in which a team of supervising robots equipped with cameras adaptively adjust their poses to visually perceive the operation of the primary construction robot and its surrounding environment. A viewpoint selection method is proposed to determine each supervising robot's camera viewpoint, optimizing visual coverage and proximity while considering the visibility of the upcoming construction robot operation. A case study on prefabricated wooden frame installation demonstrates the system's feasibility, and further experiments are conducted to validate the performance and robustness of the proposed viewpoint selection method across various settings. This research advances visual perception of robotic construction processes and paves the way for integrating computer vision techniques to enable real-time adaption and responsiveness. Such advancements contribute to the safe and efficient operation of construction robots in inherently unstructured construction sites.